39 research outputs found

    AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

    Full text link
    An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon are described. The corpus is released with a Kaldi recipe. Experimental results implies that the quality of audio recordings and transcriptions are promising.Comment: Oriental COCOSDA 201

    Convolutional Pitch Target Approximation Model for Speech Synthesis

    Get PDF
    In this paper, we investigate pitch contour modelling in speech synthesis based on segmental units. A convolutional pitch target approximation model is proposed. This model allows jointly stochastic modelling of framewise pitch and pitch contour of longer units, of which the intuitive relations are revealed by a convolutional target approximation filter. The pitch contour is stylized by a linear representation called pitch target. In synthesis stage, the likelihood of the framewise model and the pitch target model are jointly maximized using a Toeplitz matrix representing the discrete convolutional filter

    Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

    Get PDF
    Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference

    Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture

    Get PDF
    Current HMM-based low bit rate speech coding systems work with phonetic vocoders. Pitch contour coding (on frame or phoneme level) is usually fairly orthogonal to other speech coding parameters. We make an assumption in our work that the speech signal contains supra-segmental cues. Hence, we present encoding of the pitch on the syllable level, used in the framework of a recognition/synthesis speech coder with phonetic vocoder. The results imply that high accuracy pitch contour reconstruction with negligible speech quality degradation is possible. The proposed pitch encoding technique operates on 30 - 35 bits per second

    Incremental Syllable-Context Phonetic Vocoding

    Get PDF
    Current very low bit rate speech coders are, due to complexity limitations, designed to work off-line. This paper investigates incremental speech coding that operates real-time and incrementally (i.e., encoded speech depends only on already-uttered speech without the need of future speech information). Since human speech communication is asynchronous (i.e., different information flows being simultaneously processed), we hypothesised that such an incremental speech coder should also operate asynchronously. To accomplish this task, we describe speech coding that reflects the human cortical temporal sampling that packages information into units of different temporal granularity, such as phonemes and syllables, in parallel. More specifically, a phonetic vocoder — cascaded speech recognition and synthesis systems — extended with syllable-based information transmission mechanisms is investigated. There are two main aspects evaluated in this work, the synchronous and asynchronous coding. Synchronous coding refers to the case when the phonetic vocoder and speech generation process depend on the syllable boundaries during encoding and decoding respectively. On the other hand, asynchronous coding refers to the case when the phonetic encoding and speech generation processes are done independently of the syllable boundaries. Our experiments confirmed that the asynchronous incremental speech coding performs better, in terms of intelligibility and overall speech quality, mainly due to better alignment of the segmental and prosodic information. The proposed vocoding operates at an uncompressed bit rate of 213 bits/sec and achieves an average communication delay of 243 ms

    Immunometabolism changes in fibrosis: from mechanisms to therapeutic strategies

    Get PDF
    Immune cells are essential for initiating and developing the fibrotic process by releasing cytokines and growth factors that activate fibroblasts and promote extracellular matrix deposition. Immunometabolism describes how metabolic alterations affect the function of immune cells and how inflammation and immune responses regulate systemic metabolism. The disturbed immune cell function and their interactions with other cells in the tissue microenvironment lead to the origin and advancement of fibrosis. Understanding the dysregulated metabolic alterations and interactions between fibroblasts and the immune cells is critical for providing new therapeutic targets for fibrosis. This review provides an overview of recent advances in the pathophysiology of fibrosis from the immunometabolism aspect, highlighting the altered metabolic pathways in critical immune cell populations and the impact of inflammation on fibroblast metabolism during the development of fibrosis. We also discuss how this knowledge could be leveraged to develop novel therapeutic strategies for treating fibrotic diseases

    A Safety Evaluation-based Image Quality Method for Roads and Bridges

    No full text
    A computer-aided automatic safety evaluation method is proposed based on quality evaluation on digital images of roads or bridges and other image information collected by highway monitoring devices. Images of qualified roads or bridges are selected to form a reference image database, and reference image sequence and evaluation image sequence are established separately. Then combined with the peak signal to noise ratio (PSNR) and the human visual characteristic information entropy, a safety evaluation function with dynamic weights is obtained. At last, the evaluating algorithm is used to compare similarities between evaluation images and reference images to judge the quality of roads or bridges and get a sequence of evaluation parameters sequence. If the value of the evaluation parameter is greater than the threshold, the road or bridge quality changes greatly, and therefore artificial inspection is required. The experimental results show that the evaluation is consistent with the subjective perception of human vision, and the method proposed in this paper has high degree of automation

    A Safety Evaluation-based Image Quality Method for Roads and Bridges

    No full text
    A computer-aided automatic safety evaluation method is proposed based on quality evaluation on digital images of roads or bridges and other image information collected by highway monitoring devices. Images of qualified roads or bridges are selected to form a reference image database, and reference image sequence and evaluation image sequence are established separately. Then combined with the peak signal to noise ratio (PSNR) and the human visual characteristic information entropy, a safety evaluation function with dynamic weights is obtained. At last, the evaluating algorithm is used to compare similarities between evaluation images and reference images to judge the quality of roads or bridges and get a sequence of evaluation parameters sequence. If the value of the evaluation parameter is greater than the threshold, the road or bridge quality changes greatly, and therefore artificial inspection is required. The experimental results show that the evaluation is consistent with the subjective perception of human vision, and the method proposed in this paper has high degree of automation
    corecore